💼 Indian Startup Funding Analysis¶

👨‍💻 By: Anish Rana¶

🎯 MCA Data Analysis Project using Python (Pandas • Matplotlib • Seaborn)¶


🚀 Objective:
To analyze Indian startup funding data and uncover insights about investments, top cities, popular sectors, and key investors that shaped the startup ecosystem.

In [1]:
import pandas as pd
import numpy as np
df = pd.read_csv("startup_funding.csv")
In [2]:
df.head()
Out[2]:
Sr No Date dd/mm/yyyy Startup Name Industry Vertical SubVertical City Location Investors Name InvestmentnType Amount in USD Remarks
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru Tiger Global Management Private Equity Round 20,00,00,000 NaN
1 2 13/01/2020 Shuttl Transportation App based shuttle service Gurgaon Susquehanna Growth Equity Series C 80,48,394 NaN
2 3 09/01/2020 Mamaearth E-commerce Retailer of baby and toddler products Bengaluru Sequoia Capital India Series B 1,83,58,860 NaN
3 4 02/01/2020 https://www.wealthbucket.in/ FinTech Online Investment New Delhi Vinod Khatumal Pre-series A 30,00,000 NaN
4 5 02/01/2020 Fashor Fashion and Apparel Embroiled Clothes For Women Mumbai Sprout Venture Partners Seed Round 18,00,000 NaN
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3044 entries, 0 to 3043
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Sr No              3044 non-null   int64 
 1   Date dd/mm/yyyy    3044 non-null   object
 2   Startup Name       3044 non-null   object
 3   Industry Vertical  2873 non-null   object
 4   SubVertical        2108 non-null   object
 5   City  Location     2864 non-null   object
 6   Investors Name     3020 non-null   object
 7   InvestmentnType    3040 non-null   object
 8   Amount in USD      2084 non-null   object
 9   Remarks            419 non-null    object
dtypes: int64(1), object(9)
memory usage: 237.9+ KB
In [4]:
df.isnull().sum()
Out[4]:
Sr No                   0
Date dd/mm/yyyy         0
Startup Name            0
Industry Vertical     171
SubVertical           936
City  Location        180
Investors Name         24
InvestmentnType         4
Amount in USD         960
Remarks              2625
dtype: int64
In [5]:
df.shape
Out[5]:
(3044, 10)
In [6]:
df.rename(columns={
    'Sr No':'Sr_No',
    'Date dd/mm/yyyy':'Date',         
    'Startup Name':'StartupName',            
    'Industry Vertical':'IndustryVertical',     
    'SubVertical':'SubVertical',           
    'City  Location':'CityLocation',
    'Investors Name':'InvestorsName',         
    'InvestmentnType':'InvestmentnType',        
    'Amount in USD':'AmountUSD',        
    'Remarks':'Remarks' 
}, inplace = True)

df.columns
Out[6]:
Index(['Sr_No', 'Date', 'StartupName', 'IndustryVertical', 'SubVertical',
       'CityLocation', 'InvestorsName', 'InvestmentnType', 'AmountUSD',
       'Remarks'],
      dtype='object')
In [7]:
df['Date'] = pd.to_datetime(df['Date'],errors='coerce')
df['Date'].isnull().sum()
Out[7]:
np.int64(1752)
In [8]:
df['CityLocation']=df['CityLocation'].str.strip()
df['CityLocation'].replace({
    'Banglore':'Bengluru',
    'Delhi':'NewDelhi',
    'Bombay':'Mumbai',
    'Gurgaon':'Gurugram'
})
df['CityLocation'].dropna(inplace=True)
df['CityLocation'].unique()[:10]
Out[8]:
array(['Bengaluru', 'Gurgaon', 'New Delhi', 'Mumbai', 'Chennai', 'Pune',
       'Noida', 'Faridabad', 'San Francisco', 'San Jose,'], dtype=object)
In [9]:
df['AmountUSD'] = df['AmountUSD'].replace(',','',regex=True)
df['AmountUSD'] = pd.to_numeric(df['AmountUSD'], errors = 'coerce')
df['AmountUSD'].head()
Out[9]:
0    200000000.0
1      8048394.0
2     18358860.0
3      3000000.0
4      1800000.0
Name: AmountUSD, dtype: float64
In [10]:
#df.drop(['Remarks'], axis = 1, inplace =True)
In [11]:
df['IndustryVertical']=df['IndustryVertical'].fillna('Unknown')
df['InvestmentnType']=df['InvestmentnType'].fillna('Undisclosed')
df['InvestorsName']=df['InvestorsName'].fillna('Unknown Investor')
df = df.dropna(subset=['AmountUSD'])
In [12]:
df.info()
df.isnull().sum()
<class 'pandas.core.frame.DataFrame'>
Index: 2065 entries, 0 to 3043
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Sr_No             2065 non-null   int64         
 1   Date              887 non-null    datetime64[ns]
 2   StartupName       2065 non-null   object        
 3   IndustryVertical  2065 non-null   object        
 4   SubVertical       1418 non-null   object        
 5   CityLocation      1930 non-null   object        
 6   InvestorsName     2065 non-null   object        
 7   InvestmentnType   2065 non-null   object        
 8   AmountUSD         2065 non-null   float64       
 9   Remarks           337 non-null    object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(7)
memory usage: 177.5+ KB
Out[12]:
Sr_No                  0
Date                1178
StartupName            0
IndustryVertical       0
SubVertical          647
CityLocation         135
InvestorsName          0
InvestmentnType        0
AmountUSD              0
Remarks             1728
dtype: int64
In [13]:
df = df.dropna(subset=['CityLocation'])
df['CityLocation'].unique()[:10]
Out[13]:
array(['Bengaluru', 'Gurgaon', 'New Delhi', 'Mumbai', 'Chennai', 'Pune',
       'Noida', 'Faridabad', 'San Francisco', 'San Jose,'], dtype=object)
In [14]:
df.head()
Out[14]:
Sr_No Date StartupName IndustryVertical SubVertical CityLocation InvestorsName InvestmentnType AmountUSD Remarks
0 1 2020-09-01 BYJU’S E-Tech E-learning Bengaluru Tiger Global Management Private Equity Round 200000000.0 NaN
1 2 NaT Shuttl Transportation App based shuttle service Gurgaon Susquehanna Growth Equity Series C 8048394.0 NaN
2 3 2020-09-01 Mamaearth E-commerce Retailer of baby and toddler products Bengaluru Sequoia Capital India Series B 18358860.0 NaN
3 4 2020-02-01 https://www.wealthbucket.in/ FinTech Online Investment New Delhi Vinod Khatumal Pre-series A 3000000.0 NaN
4 5 2020-02-01 Fashor Fashion and Apparel Embroiled Clothes For Women Mumbai Sprout Venture Partners Seed Round 1800000.0 NaN
In [15]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1930 entries, 0 to 2872
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Sr_No             1930 non-null   int64         
 1   Date              837 non-null    datetime64[ns]
 2   StartupName       1930 non-null   object        
 3   IndustryVertical  1930 non-null   object        
 4   SubVertical       1416 non-null   object        
 5   CityLocation      1930 non-null   object        
 6   InvestorsName     1930 non-null   object        
 7   InvestmentnType   1930 non-null   object        
 8   AmountUSD         1930 non-null   float64       
 9   Remarks           279 non-null    object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(7)
memory usage: 165.9+ KB
In [16]:
unique_startup =df['StartupName'].nunique()
total_funding = df['AmountUSD'].sum()
date_min = df['Date'].min()
date_max = df['Date'].max()
In [17]:
print(unique_startup)
1592
In [18]:
print(total_funding)
36785873996.22
In [19]:
print(date_min,"to",date_max)
2015-01-05 00:00:00 to 2020-10-01 00:00:00
In [20]:
top_startups = df.groupby('StartupName')['AmountUSD'].sum().sort_values(ascending=False).head(10)
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df[['Date','Year','Month']].head()
Out[20]:
Date Year Month
0 2020-09-01 2020.0 9.0
1 NaT NaN NaN
2 2020-09-01 2020.0 9.0
3 2020-02-01 2020.0 2.0
4 2020-02-01 2020.0 2.0
In [21]:
print(top_startups)
StartupName
Flipkart            4.059700e+09
Rapido Bike Taxi    3.900000e+09
Paytm               3.148950e+09
Ola                 9.845000e+08
Udaan               8.700000e+08
Snapdeal            7.000000e+08
Flipkart.com        7.000000e+08
Ola Cabs            6.697000e+08
True North          6.000000e+08
BigBasket           5.070000e+08
Name: AmountUSD, dtype: float64
In [22]:
import matplotlib.pyplot as plt
In [23]:
year=df.groupby('Year')['AmountUSD'].sum()
plt.figure(figsize=(10,7))
year.plot(kind='bar',color='skyblue',edgecolor='black')
plt.title("Fundings by Year")
plt.xlabel("Year")
plt.ylabel("AmountUSD")
plt.xticks(rotation=45)
plt.grid(axis='y',linestyle='--',alpha=0.7)
plt.show()
No description has been provided for this image
In [24]:
### Insight about visuaisation(Various Fundings acc.to Year)
#From the graph, we can observe that the number of startups receiving funding peaked in 2017, indicating a major boom period for startup investments in India.
#After 2017, there was a noticeable decline in funding activity, although 2019 again saw a small rebound.
#This suggests that investor enthusiasm was highest during 2017, possibly due to the rise of new-age tech startups and government initiatives promoting entrepreneurship during that time.
In [25]:
import seaborn as sns
In [26]:
import plotly.express as px
In [27]:
yearly_funding = df.groupby('Year')['AmountUSD'].sum().reset_index()

fig = px.bar(
    yearly_funding,
    x='Year',
    y='AmountUSD',
    title='TOTAL STARTUP Funding By Year in INDIA',
    text_auto='.2s',
    color='AmountUSD',
    color_continuous_scale='Viridis'
)
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Total Funding (USD)',
    template='plotly_white'
)
fig.show()

Insight about the visualisation(Total Startup Funding By Year in INDIA)¶

The visualization shows that the year 2017 witnessed the highest total funding — around $4.7 billion USD, marking it as the peak year for startup investments in India.¶

After that, funding amounts fluctuated, with moderate recovery in 2019 ($2.9B) but a sharp decline in 2020 ($370M), likely due to the impact of global economic slowdown and the pandemic.¶

Overall, the graph indicates that 2015–2017 was a strong growth phase for the Indian startup ecosystem, driven by investor optimism and rapid innovation¶

top_investors = ( df.groupby('InvestorsName')['AmountUSD'] .sum() .sort_values(ascending=False) .head(10) .reset_index() )

In [28]:
import plotly.express as px
top_investors = ( df.groupby('InvestorsName')['AmountUSD'] .sum() .sort_values(ascending=False) .head(10) .reset_index() )

fig = px.bar(
    top_investors,
    x='InvestorsName',
    y='AmountUSD',
    title='Top 10 Investors in Indian Startups',
    text_auto='.2s',
    color='AmountUSD',
    color_continuous_scale='Sunset'
)

fig.update_layout(xaxis_title='Investor', yaxis_title='Total Funding (USD)', template='plotly_white')
fig.show()

Insight about visualisation(Top 10 Investors in INDIAN Startups)¶

From above plot,we can see Westbrigde Capital has invested around 3.9B in Indian Startups following that Softbank has invested 2.5B in Indian Startups.The third yet important role in investment is played softbank group.

In [29]:
df['CityLocation']=df['CityLocation'].str.strip()
df['CityLocation'].replace({
    'Bangalore':'Bengaluru',
    'Delhi':'NewDelhi',
    'Bombay':'Mumbai',
    'Gurgaon':'Gurugram'
}, inplace=True)
df['CityLocation'].dropna(inplace=True)
df['CityLocation'].unique()[:10]
C:\Users\ASUS\AppData\Local\Temp\ipykernel_17172\2044422658.py:2: FutureWarning:

A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.



Out[29]:
array(['Bengaluru', 'Gurugram', 'New Delhi', 'Mumbai', 'Chennai', 'Pune',
       'Noida', 'Faridabad', 'San Francisco', 'San Jose,'], dtype=object)
In [30]:
top_cities_df = df.groupby('CityLocation')['AmountUSD'].sum().sort_values(ascending=False).head(10).reset_index()
fig = px.bar(
    top_cities_df,
    x='CityLocation',
    y='AmountUSD',
    title='Top10 Startup Funding Cities in INDIA',
    text_auto='.2s',
    color='AmountUSD',
    color_continuous_scale='Plasma'
)
fig.update_layout(xaxis_title='City',yaxis_title='AmountUSD',template='plotly_white')
fig.show()

Insight about(Top 10 Startup Funding Cities in INDIA)¶

The plot illustrates a highly concentrated funding environment in India, with Bengaluru functioning as an outlier and a dominant global hub for startup capital.¶

The other major metropolitan centers (Mumbai, Gurugram, New Delhi) are the closest competitors, but the rest of the country attracts significantly less capital.¶

In [ ]:
 
In [31]:
top_startups_10 = df.groupby('StartupName')['AmountUSD'].sum().sort_values(ascending=False).head(10).reset_index()
In [32]:
fig = px.bar(
    top_startups_10,
    x='StartupName',
    y='AmountUSD',
    title='Top 10 Startups in INDIA',
    text_auto='.2s',
    color='AmountUSD',
    color_continuous_scale='Plasma'
)
fig.update_layout(xaxis_title='StartupName',yaxis_title='AmountUSD',template='plotly_white')
fig.show()

Insight about Top 10 startups in INDIA¶

The Indian startup funding environment is characterized by a "hub-and-spoke" model, with Bengaluru acting as the primary hub and capital highly concentrated in a small group of market-leading companies.

In [33]:
import matplotlib.pyplot as plt
import seaborn as sns
In [34]:
sns.set(style="whitegrid")
In [35]:
top_investors = (
    df.groupby('InvestorsName')['AmountUSD']
    .sum()
    .sort_values(ascending=False)
    .head(10)
    .reset_index()
)
plt.figure(figsize=(10,6))
sns.barplot(data=top_investors, x='AmountUSD', y='InvestorsName', palette='coolwarm')
plt.title('Top 10 Investors by Total Funding', fontsize=14)
plt.xlabel('Total Funding in USD')
plt.ylabel('Investor Name')
plt.show()
C:\Users\ASUS\AppData\Local\Temp\ipykernel_17172\2010843814.py:9: FutureWarning:



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.


No description has been provided for this image

Insight¶

The presence of these two funds at the top highlights the dual nature of major funding in the Indian startup market: Patient, Domestic Growth (Westbridge): A focus on long-term value creation across both private and public markets. Aggressive, Global Scale (Softbank): A focus on rapid, high-valuation growth to dominate large, tech-enabled sectors.

In [36]:
top_sectors = (
    df.groupby('IndustryVertical')['AmountUSD']
    .sum()
    .sort_values(ascending=False)
    .head(10)
    .reset_index()
)

plt.figure(figsize=(10,6))
sns.barplot(data=top_sectors, x='AmountUSD', y='IndustryVertical', palette='magma')
plt.title('Top 10 Startup Sectors by Funding', fontsize=14)
plt.xlabel('Total Funding in USD')
plt.ylabel('Sector')
plt.show()
C:\Users\ASUS\AppData\Local\Temp\ipykernel_17172\793551871.py:10: FutureWarning:



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.


No description has been provided for this image

Insight¶

The Indian startup funding ecosystem is characterized by a "Go Big or Go Home" investment thesis.The market favors large-scale bets on consumer-facing digital platforms,which results in the concentration of capital in a few billion-dollar companies and a few major metropolitan hubs.

In [37]:
pivot_data = df.pivot_table(
    index='InvestorsName',
    columns='IndustryVertical',
    values='AmountUSD',
    aggfunc='sum'
).fillna(0)

plt.figure(figsize=(12,8))
sns.heatmap(pivot_data.head(10), cmap='YlGnBu')
plt.title('Top Investors vs Sectors (Heatmap)', fontsize=14)
plt.show()
No description has been provided for this image

Insights¶

While the list of sectors is comprehensive,the visual evidence confirms that the majority of investment capital (and thus, major deals) is concentrated in a very small subset of the available investor-sector combinations. The visible sparsity on the heatmap implies that most investors specialize or that the biggest deals are contained within a handful of specific sectors and funds.

🧾 Indian Startup Funding Analysis¶

by Anish Rana¶

MCA Data Analysis Project (using Python, Pandas, Matplotlib, Seaborn)¶

This project analyzes Indian startup funding data to uncover patterns in funding amounts, top investors, active startup cities, and trending sectors over the years. The goal is to gain insights into how the startup ecosystem in India has evolved.

🧽 Data Cleaning Summary¶

  • Removed null and invalid entries.
  • Standardized city names.
  • Converted AmountInUSD into numeric values for analysis.
  • Prepared data for visualization and insights.

D 🔍 Key Insights from the Analysis¶

  1. Yearly Funding Trend:
    2017 had the highest investment volume, marking the startup boom in India.

  2. Top Cities:
    Bengaluru, Mumbai, and Delhi NCR are the top destinations for startup funding.

  3. Top Sectors:
    FinTech, E-commerce, and SaaS lead the charts in terms of funding received.

  4. Funding Rounds:
    Seed and Series A rounds dominate, showing strong early-stage growth activity.

  5. Investors:
    Sequoia Capital, Accel, and Kalaari Capital are the most active investors.

🏁 Conclusion¶

The analysis clearly shows how India's startup ecosystem matured rapidly between 2015–2019.
Investment was heavily concentrated in metro areas like Bengaluru and Mumbai, while FinTech and E-commerce became dominant sectors.

Despite a funding dip post-2017, the ecosystem remains strong with increasing investor participation and startup innovation.
This highlights India’s position as one of the world’s fastest-growing startup hubs.


🔗 Connect with me:
📧 Email: ar689356@gmail.com
💼 LinkedIn: Anish Rana
💻 GitHub: Anish20cs12¶

In [ ]:
 
In [ ]:
 
In [ ]: